Requires data modeling and quantitative research in Transport, Infrastructure & Logistics
The driver is the most important parameter in the road traffic system and is the information decision maker and participant. The state of the vehicle on the road is the final result of the driver's cognition, psychology, decision making, and execution. The study of vehicle behavior only focuses on the outcome and ignores the entire process, which does not allow us to know the factors that influence the driver to make such a driving behavior. Therefore, it is important to analyze drivers’ behavior of car following. The project first calibrates the IDM model based on the real observed vehicle trajectories through data processing. Then, heuristic genetic algorithms are used to obtain the optimal set of parameters for the driving characteristics of different drivers. Finally, the relationships between the driver's following behavior parameters and vehicle parameters are quantified and the selected quantitative model IDM model is analyzed for sensitivity in order to study the impact of driver behavior characteristics on traffic flow, which is of great significance for the future traffic development of cities and road safety.
In this project, we are very curious about the driving behavior of car drivers, so we model, optimize and analyze the vehicle following data from the ZTD data platform. We have chosen the following model algorithm and optimization algorithm as the goal of our course design. The car-following model algorithm adopts the intelligent driver model, and the optimization model chooses the genetic algorithm.
Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling
Author Vincent Yang: Data sources searching, Data prcessing (Chapter 2), Genetic Algorithm (Chapter 3).
Author Eric Zhang: Correlation analysis using SPSS (Chapter 3), Partial visualisation and data analysis in Chapter 4.
Author Shenshen Sun: Result analysis and paritial visualisation in chapter 5, Research Objective.
Author Shuai Wang: Partial visualisation and data analysis in Chapter 4 and 5, Report layout.
In this project, we will use the intelligent driver model with an explicit reaction time. Let’s indicate the mathematical framework. We will fit the driving trajectory of a real car driver based on the intelligent driver model (IDM), Equation 1.
where we control the vehicle's driving state by the vehicle's acceleration $a_{\alpha}(t+\tau)$ at each moment $t$ and $\tau_{\alpha}$ denotes the driver's reaction time. $v_0$ denotes the free-flow speed of the vehicle and $S_{\alpha}(t)$ denotes the distance difference of the vehicle. s* is a distance consisting of three parts, given by the Equation 2:
$s*$ can be interpreted as a reference distance, composed of a static and dynamic term. $S_0$ indicates the minimum parking distance, and the $v_{\alpha}(t)T$ represents the speed of vehicle multiplied by the expected time headway. The third component represents a safety distance based on the speed difference $\Delta$$v_{\alpha}(t)$, which indicates the distance a vehicle needs to travel without hitting the vehicle in front of it (without reaching b) during non-emergency braking. $a$ is the maximum acceleration of the vehicle, $b$ is the comfortable deceleration of the vehicle.
We modeled it according to the formula above, the file name is Intelligent Driver Model for assignment1.py.
The vehicle trajectory data selected in this assignment comes from the Zen Traffic Data platform (https://zen-traffic-data.net/), which collects the movement of all vehicles on Hanshin Expressway Route 11 that stretches for several kilometers at 0.1 second intervals. Most sections of this expressway kilometer have 2 lanes, respectively, the passing lane and the driving lane, and there is an extra merging lane when it meets the merging gate.
The platform currently opens the track data of 3412 vehicles, We wrote a program(Statistics information.py) to count the number of rows in the raw data, and we also find the following time for every car(find every car's time.py), we got an average following time of 165.5 seconds. The data includes vehicle number, time, speed, lane,location, vehicle length and other information, it is worth noting that the vehicle length is rounded to 0.5 m. A portion of the Hanshin Expressway is shown in the figure below
To apply the car-following model, it is first necessary to find out information about the vehicles in front and behind. Since the original data (L001_F001_trajectory.csv) does not have the pairing information of the front and rear vehicles, we first need to pair the original data. We use the R language pairing program (find_leaders.R) given on the website to pair the vehicles and remove them. All vehicles with a vehicle length greater than 6.5 meters (Remove vehicles over 6.5 meters.py), got 1793 paired vehicle information (Paired_L001_F001_trajectory.csv).
We use kilopost as the vehicle location information, however, kilopost is calculated from the latitude and longitude of the rear center position of the vehicle, and the vehicle spacing $S_{\alpha}$ in the intelligent driver model is the distance from the rear of the previous vehicle to the front of the following vehicle, so the formula for calculating $S_{\alpha}$ in this assignment is shown in Equation 3.
where $ \alpha $ is the current vehicle, $ \alpha-1 $ is the previous vehicle, $kp$ denotes kilopost, and $length$ denotes vehicle length.
We need to calibrate the model parameters of 1793 vehicles based on the improved intelligent driver model.
To this end, we use a genetic algorithm(genetic algorithm.py). We do so to fit the five parameters in the intelligent driver model, namely $S_0, T, a, b, v_0$ representing the driving behavior of the drivers. We use a library called geatpy, which has a built-in genetic algorithm kernel, and we need to set the parameters of the genetic algorithm.
The parameters are represented by a set of real numbers, and the quality of the model with these parameters by a value indicating the goodness of fit for the model with the parameter set. In this assignment, we chose to calculate the variation between the predicted car trajectory and the measured car trajectory as the root mean squared error between the measured position $y_{measured}(t)$ and predicted position $y_{IDM}(t)$, as indicated in Equation 4:
After determining the coding method and the fitness equation, we optimize the parameter set using a genetic algorithm.
For generating a next generation, we used the elite retention method. The initial generation was randomly generated within the range of values of each parameter. The population size was 20 individuals.
After that, we choose to use the two-point crossover method for the recombination of the parameter set with a recombination probability of 0.7. We use the variation operator of breeder GA as the variation algorithm for the parameter set. We set the variation probability to 1/decision variable dimensions.
Since there are five decision variables in total, we set the variation probability to 0.2. We set the condition for the termination of the algorithm to reach 50 evolutionary generations.
At the end, we do check the quality of the optimized result (see next subsection). Finally, we have to set a range for the calibration parameters to improve the calculation speed of the algorithm and to let the parameters fall within a reasonable interval, so we have to set a range for the five parameters to be calibrated, and the parameters are taken as follows.
$S_0: 1-8 m$ [Static Distance]
$T: 0.5-5 s$ [Time Headway]
$a: 0.5-6 m/s^2$ [Maximum Acceleration]
$b: 0.5-6 m/s^2$ [Maximum Deceleration]
$v_0: 0-50 m/s$ [Free-flow Speed]
Now we have the value of the objective function and also the travel time of each vehicle, so we can calculate the error of each vehicle, we choose to eliminate the vehicles with an error greater than 10, and finally we get the data of 1231 vehicles(result.csv), also It's the driver's driving habits.
import pandas as pd
import plotly.express as px
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Result data
file_path = ".\\result.csv"
df = pd.read_csv(file_path, delimiter=',')
df
Based on the data of the parameters given in the above table, 7 of them (S, T, aMax, bMax, v0, length(car), front_length(car)) are selected and correlation analysis is done using SPSS.
fig = px.scatter_3d(df, x='T', y='v0', z='aMax', color='length')
fig.show()
# Each vehicle is marked with number = 1
df.insert(df.shape[1], 'number', 1)
df
# Sorting vehicles of different lengths
df_4 = df.groupby("front_length").agg({"number": sum})
df_4
fig = px.bar(df_4, x=df_4.index, y="number", text_auto=True)
fig.update_traces(textposition = "outside")
fig.show()
fig = px.box(df, x="front_length", y="T")
fig.show()
When the front vehicle length is between 3 and 6.5m, the expected time headway($T$) is more stable, with values fluctuating between approximately 1.30 and 1.50.
When the front vehicle length is greater than 6.5m, the expected time headway($T$) fluctuates greatly, indicating that the driver is unable to accurately determine the distance to the front vehicle.
fig = px.box(df, x="length", y="T", color='length')
fig.show()
A general trend can be drawn that the longer the length of the car the driver is driving, the expected time headway($T$) becomes greater.
And according to the data results, when the car length is 5.5m, the expected time headway($T$) is most unstable, with a large distribution range(1.27s to 3.12s).
# T(Expected Time Headway) and Length Figure
fig = px.scatter(df, x="length", y="T", trendline="ols")
model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]
fig.data[0].name = 'data point'
fig.data[0].showlegend = True
fig.data[1].name = fig.data[1].name + 'y = ' + str(round(alpha, 2)) + ' + ' + str(round(beta, 2)) + 'x'
fig.data[1].showlegend = True
fig.show()
According to the regression line, it can be seen that for every meter increase in the length of the car, the expected time teadway (T) increases by 0.2s.
file_path_45 = ".\IDMdata_for_45.csv"
df_45 = pd.read_csv(file_path_45, delimiter=',')
a = []
for i in range(0, len(df_45), 20):
a.append(i)
file_45 = df_45.iloc[a]
file_45
fig_45 = px.scatter(file_45, x='time', y="Acceleration", trendline="lowess", trendline_options=dict(frac=0.1))
fig_45.show()
# Draw picture about time and space map of car No.45 and its front car
x=df_45['time']
y1=df_45['following car position']
y2=df_45['front car position']
plt.figure(figsize=(14,10))
l1,=plt.plot(x,y1,linewidth=1)
l2,=plt.plot(x,y2,linewidth=1)
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.legend((l1,l2,),['following car','front car'])
plt.title("Time and space map of car No.45 and its front car", fontdict=font1)
plt.xlabel("time(s)", fontdict=font2)
plt.ylabel("Position(m)", fontdict=font2)
#
plt.show()
After having a range for the five parameters of 1232 vehicles, we tried to make a sensitivity analysis of IDM model. A smoothly moving vehicle following the IDM model was selected from the filtered data and analyzed by adjusting the value of a parameter through the control variables method, leaving other parameters unchanged. The effect of different parameter settings on the model is analyzed and the results are visualized.
The third vehicle after filtering is selected as the target vehicle for sensitivity analysis.
The parameters after the third vehicle was calibrated are: $$s_0=2.10,\ T=1.76,\ aMax=1.31,\ bMax=3.45,\ v_0=47.50$$ The vehicle speed is updated every 0.1 seconds in python and the speed taken by the vehicle at different times is output separately. With other parameters kept constant, the value of $s_0$ was adjusted to obtain the $v-t$ image of the vehicle motion for different values of $s_0$ as shown in the Figure 5.1.
file_path_S = ".\S.csv"
df_S = pd.read_csv(file_path_S, delimiter=',')
df_S
fig_S = px.line(df_S, x='t', y=df_S.columns[0:6])
fig_S.update_layout(
title={
"text":"5.1 Sensitivity Analysis of Static Distance(S)",
"y":0.96,
"x":0.5,
"xanchor":"center",
"yanchor":"top"
}
)
fig_S.update_layout(xaxis_title='t(s)',yaxis_title='V(m/s)')
fig_S.show()
With the change of $s_0$ values, the $v-t$ image of vehicle 3 doesn’t have too much difference. Apparently, changing the value of $s_0$ has less effect on motion of the vehicles which follow IDM model and $s_0$ has less effect on the sensitivity of the IDM model.
With other parameters kept constant, the value of $T$ was adjusted to obtain the $v-t$ image of the vehicle motion for different values of $T$ as shown in Figure 5.2.
file_path_T = ".\T.csv"
df_T = pd.read_csv(file_path_T, delimiter=',')
df_T
fig_T = px.line(df_T, x='t', y=df_T.columns[0:6])
#fig.update_traces(line=dict(width=1.5))
fig_T.update_layout(
title={
"text":"5.2 Sensitivity Analysis of Time Headway(T)",
"y":0.96,
"x":0.5,
"xanchor":"center",
"yanchor":"top"
}
)
fig_T.update_layout(xaxis_title='t(s)',yaxis_title='V(m/s)')
fig_T.show()
In Figure 5.2, the vehicle motion state varies considerably under the influence of different headway time distances, indicating that the values $T$ has a large influence on the sensitivity of the IDM model and when the value of $T$ goes larger, the more stable the change in vehicle speed.
With other parameters kept constant, the value of $aMax$ was adjusted to obtain the $v-t$ image of the vehicle motion for different values of $aMax$ as shown in Figure 5.3.
file_path_A = ".\A.csv"
df_A = pd.read_csv(file_path_A, delimiter=',')
df_A
fig_A = px.line(df_A, x='t', y=df_A.columns[0:5])
fig_A.update_layout(
title={
"text":"5.3 Sensitivity Analysis of Maximum Acceleration(aMax)",
"y":0.96,
"x":0.5,
"xanchor":"center",
"yanchor":"top"
}
)
fig_A.update_layout(xaxis_title='t(s)',yaxis_title='V(m/s)')
fig_A.show()
Similarly, from Figure 5.3, it can be seen that the maximum acceleration has a large effect on the sensitivity of the IDM model. When the value of $aMax$ goes smaller, the more stable the change in vehicle speed.
With other parameters kept constant, the value of $bMax$ was adjusted to obtain the $v-t$ image of the vehicle motion for different values of $bMax$ as shown in Figure 5.4.
file_path_B = ".\B.csv"
df_B = pd.read_csv(file_path_B, delimiter=',')
df_B
fig_B = px.line(df_B, x='t', y=df_B.columns[0:6])
fig_B.update_layout(
title={
"text":"5.4 Sensitivity Analysis of Maximum Deceleration(bMax)",
"y":0.96,
"x":0.5,
"xanchor":"center",
"yanchor":"top"
}
)
fig_B.update_layout(xaxis_title='t(s)',yaxis_title='V(m/s)')
fig_B.show()
Similarly, from Figure 5.4, it can be seen that the maximum deceleration has a large effect on the sensitivity of the IDM model. When the value of $bMax$ goes smaller, the more stable the change in vehicle speed.
After the above analysis of the four parameters, $s_0$ has a small effect on the sensitivity of IDM model and $T,\ aMax,\ bMax$ has a large effect on the IDM model.